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ABSTRACT 

Many platforms for genome-wide analysis of gene 
expression contain 'redundant' measures for the 
same gene. For example, the most highly utilized 
platforms for gene expression microarrays, 
Affymetrix GeneChip® arrays, have as many as ten 
or more probe sets for some genes. Occasionally, 
individual probe sets for the same gene report dif- 
ferent trends in expression across experimental 
conditions, a situation that must be resolved in 
order to accurately interpret the data. We developed 
an algorithm, SCOREM, for determining the level of 
agreement between such probe sets, utilizing a stat- 
istical test of concordance, Kendall's W coefficient 
of concordance, and a graph-searching algorithm 
for the identification of concordant probe sets. We 
also present methods for consolidating concordant 
groups into a single value for its corresponding 
gene and for posf hoc analysis of discordant 
groups. By combining statistical consolidation with 
sequence analysis, SCOREM possesses the unique 
ability to identify biologically meaningful discordant 
behaviors, including differing behaviors in alternate 
RNA isoforms and tissue-specific patterns of 
expression. When consolidating concordant behav- 
iors, SCOREM outperforms other methods in 
detecting both differential expression and over- 
represented functional categories. 

INTRODUCTION 

The Gene Expression Omnibus (GEO) database at NCBI 
is an extensive repository of publicly available, 
genomic-scale data, including gene expression data. The 
most common platforms in GEO are Affymetrix 
GeneChip® arrays (http://www.ncbi.nlm.nih.gov/geo). 
One significant issue with analyzing this type of data is 



that, for any given gene, a GeneChip®can contain more 
than one probe set designed to hybridize to the tran- 
script(s) for that gene. In many gene expression studies, 
a gene is stated to be differentially expressed if any one of 
its representative probe sets reports differential expression, 
without regard for the other probe sets. Ideally, a group of 
probe sets representing the same gene will always behave 
concordantly, i.e. report similar measures of differential 
expression. However, this is not always the case. In 
order to obtain the maximum and most accurate informa- 
tion possible, therefore, redundant probe sets must be 
addressed. 

Various approaches to dealing with redundant measures 
of gene expression have been proposed, from naive to 
elaborate. Naive approaches include choosing the probe 
set with the highest variance (1) or the best _P- value (2), or 
simply accepting an average value of the majority of probe 
sets. More elaborate methods make use of statistical tests 
to analyze the agreement between probe sets (3,4). 
Another type of methodology is to reannotate all 
uniquely assignable probes based on sequence ahgnment 
to a given gene (5-7) or specific transcript sequence (8-10) 
into one gene- or transcript-specific probe set. Each of 
these approaches has its limitations. In order to 
overcome those limitations and to take full advantage of 
all the data available from a given experiment, we have 
developed the SCOREM algorithm. 

The SCOREM algorithm utilizes Kendall's W coeffi- 
cient of concordance, converts the results to an adjacency 
matrix and then performs a search for connected compo- 
nents (subgraphs) to identify concordant and discordant 
groups of probe sets. We then use sequence ahgnment to 
form biological interpretations of discordant observa- 
tions. The SCOREM algorithm improves the interpret- 
ation of redundant gene expression measures both by 
adding statistical confidence to concordant measures, 
yielding improved detection of differential expression 
and functional enrichment compared to other methods, 
and by finding meaningful biological reasons (including 
the prediction of novel alternate RNA isoforms) for 
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discordant measures, something no statistics-only or 
sequence-only method can deliver. Although in the re- 
mainder of this article we refer only to Affymetrix micro- 
arrays, the same issue of redundant probe sets also exists 
for other gene expression platforms, and the SCOREM 
algorithm is inherently applicable to those as well. 



MATERIALS AND METHODS 

Data sources and processing 

Eight sets of data including four different Affymetrix 
GeneChip® arrays were used in this study; five are 
publicly available through NCBI's GEO website 
(Table 1). Raw data (where available) were normalized 
using the R package gcrma (19). Filtering was performed 
with genefilter (20), and fold change (m) and significance 
(P) were calculated with eBayes in the limma package (21). 
Generally, statistical significance is determined using hard 
cutoffs, e.g. a value of m greater than 2 with P better than 
0.01. Here, we have implemented a shding scale for fold 
change and significance cutoffs: the critical value for P is 
a, a = 0.01 when \m\ = 1; a increases up to Pmax (e-g- 0.05) 
when \m\ > 1 and decreases when \m\ < 1. The critical 
value for P for any given value of m is given by a 
sigmoid function, similar to that described in Ref. (22). 
This shding scale can be utilized with or without prior 
/•-value adjustments, such as false discovery rate (FDR) 
correction (23). The analyses of the data sets specified 
were completed with and without FDR correction. 
Correction with an FDR of 0.05, however, can be more 
or less stringent than using a hard cutoff of /"< .01, de- 
pending on the sample. The use of FDR correction is 
further discussed in the context of consohdation of con- 
cordant probe sets. 

The analyses discussed below was carried out in R, 
using Bioconductor annotation packages Mouse4302.db 



version 2.5 and org. Mm. eg. db version 2.5 in Bioconductor 
release 2.8. 



Measurement of concordance 

Kendall's W, or Kendall's coefficient of concordance, is a 
statistical method used to measure inter-rater rehability 
(24). Originally devised for subjective human observations 
in psychological studies, it has been used for the analysis 
of biological data such as species associations in commu- 
nity ecology (25). Unhke other tests of concordance, 
Kendall's W does not consider absolute values of 
ratings, but overall ordering, as it is a rank-sum method 
based on Spearman's p correlation coefficient. When 
applied to a group of probe sets, a significant value of 
W (as defined below) indicates a high level of concordance 
across a group of probe sets. Groups with a value below 
the critical value are discordant. 

Kendall's W is calculated from the mean of all pair-wise 
Spearman's coefficients as follows (24): 



W 



{m- l)p+ 1 



(1) 



where k is the number of judges (probe sets), kC2 
(k choose 2) is the number of pairs of judges, and p is 
the mean Spearman's correlation coefficient. It can 
further be seen that 



E P/ 2 E Pi 



P 



kCj k(k-l) 
which when substituted into Equation (1) gives 

IZPi + k 

W = 



(2) 



(3) 



Table 1. Data used in evaluating the SCOREM algorithm 



Experiment 


Tissue type 


Conditions 


Platform 


Source (Raw") 


No. of 
samples 


GSE3678'' 


Thyroid 


Tumor versus normal 


Human Genome U133 Plus 2.0 (GPL570) 


GEO (yes) 


14 


GSE4051 (11) 


Retina 


Nrl-knockout versus wild-type at 
5 time points in development 


Mouse Genome 430 2.0 (GPL1261) 


GEO (yes) 


39 


GSE4799 (12) 


Spermatogonial 
stem cells 


Growth factor restoration (3 time 
points) versus withdrawal and 
baseline 


Mouse Genome 430 2.0 (GPL1261) 


GEO (no) 


15 


GSE9371 (13) 


Aorta 


Estrogen versus placebo in wild 
type, ERa- and ERP- 
knockouts 


Mouse Genome 430 2.0 (GPL1261) 


GEO (yes) 


22 


BrCa^ (14-16) 


Breast cancer 
(MCF7 cell line) 


Estrogen versus placebo at 2 time 
points 


Human Genome U133A (GPL96) 


GEO (no) 


26 


VSMC (17) 


Aorta 


Estrogen versus placebo at 3 time 
points 


Mouse Genome 430A 2.0 (GPL339) 


Authors (yes) 


18 


GonFat (18) 


Gonadal fat 


Female versus male and female 
versus ovarioectomized female 


Mouse Genome 430 2.0 (GPL1261) 


Authors (yes) 


21 


IngFat (18) 


Inguinal fat 


Female versus male and female 
versus ovarioectomized female 


Mouse Genome 430 2.0 (GPL1261) 


Authors (yes) 


21 



"Whether or not raw data (.CEL files) were available or only preprocessed data. 
''Reyes el al. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi7acc = GSE3678. 

"This data set consists of similarly treated samples from GEO series GSE4006, GSE4025 and GSE9936. 
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The significance of an individual value of p is 
determined using the Student's /-test for significance 
with n — 2 degrees of freedom, where n is the number of 
observations (arrays) (26). 

A critical value (cutoff) for p is calculated by 
rearranging the terms of this equation, using an appropri- 
ate critical value for t (e.g. / < 0.01). 

Pcrit = 5" 9 

V '?rit + n-2 

The critical value for W is then calculated as half the 
distance between Pcrit and 1 (a perfect correlation), making 
the W cutoff more stringent than that for an individual p. 

„r Pcrit + 1 
W^crit = 2 



Detection of concordant subgroups 

If the entire group's W does not meet the cutoff W^crit. a 
search for concordant subsets within the group (sub- 
groups) is performed. It is useful to note at this point 
that Equation (3), from the rearrangement of Equation 
(1), is the same as taking the mean of the symmetric 
matrix of pairwise coefficients (Table 2). This matrix is 
then converted into a matrix of true/false values, based 
on each p > Pcrk- This creates an adjacency matrix for an 
undirected graph, reducing the problem of finding con- 
cordant subgroups to that of finding connected compo- 
nents within the graph. If a group is insufficiently 
concordant (connected under p > Pcrit but W < W^n), a 
recursive search for concordant subgroups is performed. 

Consolidation of concordant groups 

When concordant groups or subgroups are found, a 
combined /"-value can be obtained by performing 
Fisher's method [Equation (7)] and obtaining the 
_P-value for the resulting /" value (27). The individual 
_P-values being combined are those generated by eBayes 
(21) for each probe set within the group. 

X- = -2^1og(P,) (7) 



Table 2. Calculation of W from the mean of the matrix of all 
pairwise correlation coefficients between k judges 
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Fisher's method is intended for independent measure- 
ments (27). Due to the design and experimental protocol 
under which the gene expression arrays are performed, 
redundant probe sets can indeed be considered independ- 
ent measures. The standard Affymetrix sample prepar- 
ation protocol calls for fragmentation of RNA (resulting 
fragments are 35-200 bases long), therefore the pool of 
fragments hybridizing to a given probe set is distinct and 
independent from that hybridizing to another. For cases in 
which there are redundant probe sets for the same gene, 
some represent uniquely processed transcripts; in theses 
cases, hybridization is clearly independent. However, 
even when probe sets represent the same transcript, 
there is minimal overlap between regions covered by 
those probe sets: the average overlap is 2-3%, with the 
majority (80-85%) having no overlap at all. Thus, Fisher's 
method is an appropriate choice. 

Finally, each group or subgroup is given a new anno- 
tation that indicates which probe sets are included, and 
the total size of the group, e.g. for the gene with Entrez 
Gene identifier 99 999, new annotations 99999.1_2_3.5, 
99999.4.5, and 99999.5.5 indicate that probe sets 1, 2 
and 3 out of 5 have been consolidated, while 4 and 5 
have been left as individual values. When there is more 
than one final value for a given gene, a post hoc analysis is 
performed, to determine which value(s) makes the most 
sense, biologically speaking, to use. When necessary, 
probe sequences were aligned to the most current 
version of their annotated RefSeq transcripts and gene 
exon tables downloaded from NCBI (28). When probe 
sets failed to ahgn to the NCBI transcript sequences, 
further analysis was performed using the UCSC Blat 
website (29,30). 

R software package SCOREM 

AU the programs needed to carry out this analysis have 
been included in an R software package, available for 
download from the NAR website (Supplementary Data). 
Requirements are a normalized ExpressionSet object (as, 
for example, produced by gcrma) and an MArrayLM 
object with P values, such as produced by eBayes. 
Appropriate annotation packages must also be available. 
The SCOREM package includes methods for determin- 
ation of concordance, consohdation of concordant 
groups and determination of differential expression, as 
weU as detection of discordant groups remaining after 
consohdation. Further analysis can be performed using 
UCSC Blat website or the stand-alone Blat server 
package available for download. Output of the Blat 
software (.psl files) can be visualized on the UCSC 
Genome Browser website, or the Integrated Genome 
Browser application. Since transcript annotations change 
on a daily basis, ahgnment and visualization of probe sets 
of interest can be repeated as needed. 

RESULTS 

Redundant probe sets on Affymetrix arrays 

The three most common platforms in GEO are the 
Affymetrix Human Genome U133 Plus 2.0, the Human 



e46 Nucleic Acids Research, 2012, Vol. 40, No. 6 



Page 4 OF 12 



12000 



10000 - 



8000 



6000 



4000 



2000 



■ Human Genome U1 33 Plus 2.0 
□ Human Genome U133A 
H Mouse Genome 430 2.0 



23456789 
Number of probe sets per gene 



10+ 



Figure 1. Distribution of the number of probe sets per gene on the 
three indicated popular Affymetrix GeneChip* arrays. 



Genome U133A and the Mouse Genome 430 2.0 
GeneChip® arrays. On any of these arrays, a gene may 
be represented by one or more probe sets. For instance, 
the U133 Plus 2.0 array averages 2.8 probe sets per gene 
(54 675 probe sets representing 19 621 genes), while the 
smaller 133 A array averages 1.8 probe sets per gene. 
Overall, about half of all genes are represented by more 
than one probe set; a few are represented by ten or more 
probe sets (Figure 1). Ideally, all the probe sets for a gene 
would hybridize concordantly, which would provide 
added confidence to the behavior being observed. 
However, on occasion some groups of probe sets instead 
behave discordantly, for a variety of reasons: cross- 
hybridization to another transcript, misannotation or 
alternate-transcript-specific binding (which may also be 
cell type-specific). Such groups of discordant probe sets 
must be identified and analyzed further, to determine the 
mostly likely signal for that condition in the biological 
samples. 

Several approaches to resolve the issue of different 
results from supposedly analogous probe sets have been 
proposed. The simplest approaches for deahng with re- 
dundant probe sets do not examine their relative behav- 
iors, but instead employ naive methods such as choosing 
the probe set with the highest variance (1), choosing the 
one with the best /"-value (2) or taking a 'majority rules' 
approach. Other approaches use statistical tests to analyze 
the level of agreement among the data. Jaksik et al. (3) use 
Dixon's Q test to detect probe sets that are statistical 
outliers, having significantly higher or lower hybridization 
levels than the rest, but they do not examine the level of 
agreement in behavior across conditions. Li et al. (4) use 
ANOVA to attempt to distinguish probe sets that behave 
concordantly from those that do not. Their analysis, 
however, takes an all-or-none approach, ignoring the 
possibihty that some subset of the probe sets might 
be in agreement. Other groups have implemented a 



sequence-based approach to redefine the probe sets on 
the Affymetrix GeneChip® arrays in order to create a 
one-to-one mapping of probe sets to genes (5-7) or tran- 
scripts (8,9). Each of these approaches has its advantages 
and its limitations, the latter of which the SCOREM 
algorithm is designed to overcome. 

SCOREM: statistical consolidation of redundant 
expression measures 

A typical microarray analysis involves pre-processing the 
raw data from .CEL files, using linear modehng to fit the 
samples to conditions, and determining fold change 
(m-value) between conditions for each probe set 
(Figure 2a). Optionally, filtering may be performed to 
remove probe sets reporting extremely low expression or 
variabihty. This step is required with the consolidation 
approach in order to remove probe sets with zero 
variance, which cannot be included in correlation calcula- 
tions. Additionally, confidence levels (/"-values) are 
generated for every m-value, and cutoffs may be applied 
to the m- and P-values to extract a hst of genes that are 
statistically differentially expressed ('Materials and 
Methods' section). In this analysis, we add two additional 
steps (Figure 2b). First, the processed data are analyzed 
for concordance of gene groups. Where groups are not in 
concordance, they are annotated into concordant sub- 
groups (or individually, as warranted). Second, after the 
m- and /"-values are calculated, concordant groups/sub- 
groups are consoHdated to one pair of values. 

Concordance is measured with Kendall's W test 
('Materials and Methods' section) (24). When a group is 
found not to be concordant, the group is searched for 
subgroups that are concordant. Probe sets are then 
reannotated with their new group ids — all the same for 
groups with complete concordance, or distinct ids for 
two or more subgroups. In the most discordant case, 
each probe set remains as a separate value. When the 
analysis is complete, each consohdated (sub)group will 
have a single m-va.\\xe (fold change) with a corresponding 
/"-value for each comparison performed in the experiment 
(e.g. treatment versus control). To determine m for a con- 
cordant (sub)group, the individual m-values are averaged, 
/'-values are combined using Fisher's method (27). The 
resulting values are tested for statistical significance as 
described in 'Material and Methods' section. The final 
list of differentially expressed genes can then be used for 
further analysis, such as enrichment of functional 
categories. In order to understand the biological basis 
for any probe set groups that were not fully consolidated, 
they are examined further by sequence alignment to the 
current version of the transcript/gene. 

Application of SCOREM 

In all, we applied the SCOREM algorithm to eight data 
sets, on four different Affymetrix arrays (Table 1). On 
average, 41% (range 6-56%) of the redundant probe 
sets were able to be completely consohdated, 19% were 
partially consoHdated, and approximately 40% were not 
able to be consolidated at all (Table 3). The genes in the 
latter two categories are examined further, to examine 



Page 5 OF 12 



Nucleic Acids Research, 2012, Vol. 40, No. 6 e46 



^Raw Data (.CEL files)^ 



(b) 





r 


Preprocessing 

(background correction, 
CDF-based summarization 
of probes, normalization) 


> 


r 


Probe Set Filtering 

(optlonai) 


> 




Processing 

(linear modeiing and 
contrast fitting) 







^Raw Data (.CEL files)^ 


OR 


^ Preprocessed Data ^ 






1 ^ 1 


Preprocessing 




Filtering and Processing 


> 



Group probe sets 
by Entrez Gene IDs 



Calculate W 




Reannotate and Consolidate 
Concordant (Sub)Groups 



Accept Single 
Values 



Are m- and p-values 
significant? 

Tves 



r - - ^ 

I Post-hoc Analysis 

(sequence alignment) 



List of Differentially 
Expressed Probe Sets 




( List of Differentially | 
I Expressed Genes J 



Figure 2. Flowchart for algorithms for analyzing gene expression data, (a) Typical Affymetrix processing, (b) SCOREM processing with concord- 
ance testing and consolidation. 



Table 3. Degree of probe set consolidation by the SCOREM algorithm in eight data sets 



Experiment 



Probe sets (Genes)" 



Consolidation level (%) 





On GeneChip® 


After filtering 


Unannotated 


Single 


Redundant 


After SCOREM 


None 


Partial 


Complete 


GSE3678 


54 675 (19 798) 


18725 (10516) 


2578 


6802 


9345 (3714) 


6668 (3714) 


44 


19 


37 


GSE4051 


45 101 (20 757) 


16 992 (9963) 


1996 


6645 


8351 (3318) 


5019 (3318) 


27 


19 


54 


GSE4799 


45 101 (20 757) 


44062 (20459) 


6597 


11006 


26459 (9453) 


24 192 (9453) 


80 


15 


5 


GSE9371 


45 101 (20 757) 


20728 (11 184) 


1937 


6512 


12 279 (4672) 


9094 (4672) 


47 


22 


30 


BrCa 


22 283 (12718) 


13 397 (8558) 


982 


5939 


6476 (2619) 


4031 (2619) 


28 


19 


53 


VSMC 


45 101 (20 757) 


19914 (11 717) 


1012 


7171 


11731 (4546) 


8224 (4546) 


44 


19 


37 


GonFat 


45 101 (20 757) 


24 260 (12 586) 


2472 


7073 


14715 (5513) 


9320 (5513) 


30 


23 


48 


IngFat 


45 101 (20757) 


24 364 (12 585) 


2590 


7080 


14 694 (5505) 


8966 (5505) 


28 


20 


52 



"Numbers in parentheses indicate the number of distinct genes represented by those probe sets. 



potential reasons for the discordant behaviors of their 
probe sets. 

In comparing any two conditions A and B (e.g. drug 
treatment versus control, mutant versus wild type, or 
tumor versus normal tissue), a given probe set or group 
of probe sets can be (i) higher in A than in B, (ii) lower in 
A than in B or (iii) the same in A and B (no change). Thus, 
types of discordant behavior we can examine include when 
one probe set or subgroup reports differential expression 
(either higher or lower) in a given condition while another 
probe set or subgroup is unchanged. Another occurs when 
two groups report differential expression in opposite dir- 
ections. A third can be considered to occur when both 
groups report differential expression, in the same direc- 
tion, but of very different magnitudes (e.g. two-fold 
versus eight-fold increases). In an experiinent with more 
than two conditions (e.g. a time course or testing multiple 
different drugs), discordant subgroups can exhibit one or 



more of these behaviors. Table 4 summarizes, for each 
data set, how many instances of the first two types of 
discordant behaviors exist among the genes reported to 
have changing expression levels in more than one 
subgroup of probe sets. 

Post hoc analysis and the detection of differential 
processing 

Potential reasons for discordant behavior among probe 
sets representing the same gene include multiple RNA 
isoforms for the gene, misannotation of probe sets and 
cross-hybridization of multiple mRNAs to a single probe 
set. Multiple RNA isoforms can be generated via alterna- 
tive splice sites, polyadenylation sites and/or promoters. 
To determine whether probe set inconsistencies in hybrid- 
ization levels could be due to biologically significant dif- 
ferences, the probes in each probe set or group are 
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Table 4. Characterization of differentially expressed groups represent- 
ing differentially expressed genes from eight gene expression data sets 

Data set Total Number of genes represented by 

number 

of groups A single Multiple subgroups'* 

group 

(a) Yes (b) Small (c) Down 
versus versus versus 
no large up 



GSE3678 


1430 


1383 


0 


5 


0 


GSE4051 


4006 


3656 


126 


24 


13 


GSE4799 


5122 


4239 


348 


16 


43 


GSE9371 


1593 


1509 


32 


3 


2 


BrCa 


947 


915 


5 


1 


0 


VSMC 


451 


447 


1 


0 


0 


GonFat 


4559 


4449 


14 


3 


13 


IngFat 


3049 


2933 


32 


2 


6 



"Columns refer to discordant groups where (a) one is changing and the 
other is not; (b) both are changing in the same direction, but with 
different magnitudes; and (c) one is increasing while the other is 
decreasing. 



mapped to the sequences for all annotated transcript 
variants of that gene. 

Through such mapping we have also vahdated the rele- 
vance of the SCOREM approach, as many of the 
discordancies can be attributed to known cases of alterna- 
tive RNA transcripts. In the data sets we analyzed, there 
were 77 cases in which discordantly expressed subgroups 
for the same gene were expressed in opposite directions 
(Table 4, final column). The 76 genes (one is discordant in 
two data sets) are represented on the arrays by 298 total 
probe sets (199 after gene filtering). For 26 of these genes, 
probe sets differentially mapping to distinct, known RNA 
isoforms can account for the discrepancy (Supplementary 
Table SI); the alternate RNA isoforms are annotated in 
RefSeq (28) for 15 of these genes and in the UCSC 
Genome Browser (29) for an additional 11 cases. These 
reflect all types of alternative transcript formation: alter- 
native splicing (fifteen genes), alternative polyadenylation 
(nine genes) and alternative promoters (two genes). 

One example where discordant probe sets map to 
distinct transcripts occurs in Shprh (encoding SNF2 
histone hnker PHD RING helicase). This gene is repre- 
sented by four probe sets on the Affymetrix Mouse 
Genome 430 2.0 array. In the GSE4051 data set, 
SCOREM measures the correlation coefficient W to be 
0.42, not significantly concordant. However, when the dis- 
cordant subgroups are analyzed, it is clear that two of the 
probe sets reflect discordant expression values; hybridiza- 
tion to one (1452261_at) increases when that to the other 
(1457327_at) decreases. There are two known RefSeq 
transcripts for Shprh, NM_1 72937 and NM_00 1077707 
(Figure 3a); due to differential splicing these transcripts 
contain distinct 3' untranslated regions (UTR). AHgning 
the probe sequences to the genomic sequence of Shprh 
reveals that each of the discordant probe sets maps exclu- 
sively to only one of the two different isoforms. Thus, it is 
straightforward to interpret that under conditions of this 
experiment (in the mouse retina in response to a knockout 



of the Nrl gene), when expression of the Shprh transcript 
NM_00 1077707 increases, expression of transcript 
NM_1 72397 decreases. Thus, the SCOREM analysis 
uncovers novel biological information regarding expres- 
sion of this gene. 

For an additional 35 genes in which discordantly ex- 
pressed subgroups were expressed in opposite directions, 
known isoforms did not distinguish behavior between 
probe sets. Instead, mapping of the probe sets suggest 
novel RNA isoforms. The simplest interpretations of 
various observed scenarios include: (i) when discordant 
subgroups of probe sets map to distinct coding exons, 
we propose novel alternatively spliced variant(s); (ii) 
when a discordant subgroup maps to an intron between 
coding exons, we propose alternate isoforms containing 
novel exons; (iii) when discordant subgroups map to 
distinct regions within the known 3'UTR or when some 
map downstream of the known 3'UTR, we propose alter- 
native polyadenylation; (iv) when a discordant subgroup 
maps either to regions upstream of the identified promoter 
or to an intron prior to the encoded translational initi- 
ation codon, we propose an alternative transcription 
start site (promoter). Other interpretations of novel 
RNA isoforms for the respective genes are clearly 
possible. The analysis does not pinpoint the alternate 
RNA isoform, but it does suggest that more than the 
known RNAs are being expressed, for subsequent experi- 
mental exploration. Finally, it is possible (depending on 
the source of RNA used in the experiments) that signal 
from probe sets (especially ones in introns) reflect either 
partially processed RNA or RNAs unconnected to the 
gene in question (e.g. non-coding RNAs overlapping the 
genomic region). 

Examples of three categories of genes containing dis- 
cordant probe sets that cannot be explained by known 
RNA isoforms are shown in Figures 3b-d. Dpysl2 is rep- 
resented by discordant probe sets that both map (partially 
or completely) in the 3'UTR (Figure 3b); this is an 
example of potential alternate polyadenylation sites in 
the terminal exon. For Scmhl, one discordant probe set 
maps in the intron upstream of the first coding exon 
(Figure 3c); this is predicted to reflect an alternate 
promoter, most hkely mapping within the annotated 
intron 4. For MlltlO, there are two probe sets mapping 
to annotated introns 3 and 4, respectively, where their 
expression levels change in opposite directions 
(Figure 3d). This scenario suggests alternative splicing, 
presumably including novel coding exons within the 
annotated introns 3 and/or 4. 

Finally, 16 of the 77 cases could not be explained by the 
existence of alternate RNA isoforms. For these, the use of 
BEAT (30) to align the individual probe sequences against 
the entire genome can reveal whether some of the probes 
do not appropriately represent the gene in question 
(misannotation) or represent more than one gene 
(cross-hybridization). For eight of these genes, at least 
one of the probe sets is misannotated (including probe 
sets that map to the non-coding strand, or to non-coding 
RNAs overlapping the given gene). Two genes are repre- 
sented by a probe set that has the potential to 
cross-hybridize to another gene on a separate 
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(d) MlltIO (17354) 
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(trithorax homolog, Drosophila); translocated to, 10 
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Figure 3. Examples of post hoc analysis in order to determine causes for discordant expression patterns. Genomic DNA is indicated as a line, with 
exons indicated as boxes on the line. Nucleotide positions within each genes are given with zero representing the presumed transcription start site for 
the indicated mRNAs. Mapping of each probe set is color-coded for each gene. Translation start and stop codons are indicated by arrows and 
asterisks, respectively. Whether each probe set is differentially regulated in the experimental samples, and the direction of regulation, are indicated in 
parentheses after each probe set id. (a) Detection of different expression patterns in known alternate RNA isoforms of Shprh. The probe sets 
reporting differential expression correspond to exons unique to each of the known isoforms. Expression changes are for experiment GSE4051 at the 
post-natal day 10 time point, (b) Identification of a possible novel isoform of the gene Dpysl2. The alternate RNA isoform could involve alternate 
splicing of exons 12-14 or an alternate 3'UTR (polyadenylation site). Expression changes are for experiment GSE4799, deprivation versus untreated. 

(c) Identification of a potential alternate promoter in Scmhl. Expression changes are for experiment GSE4051 at the embryonic day 16 time point. 

(d) Identification of potential novel coding exons in MlltIO, in introns 3 and/or 4. Expression changes are for experiment GSE4799, 2h 
post-restoration versus untreated. 



chromosome. Finally, six genes do not include annotated 
exon information, and cannot therefore be assessed. Full 
details for all 77 cases are provided in Supplementary 
Table SI. 

Overall, the abihty to identify and map the locations of 
probe sets that reflect discordant expression patterns 
expands and clarifies the information gleaned from 
analyzing gene expression data. As a validation that 



SCOREM appropriately separates probe sets into discord- 
ant subgroups, over a third of the more severely discord- 
ant subgroups can be explained by distinct, known RNA 
isoforms. For these genes, the discordant expression 
behavior can add valuable biological insights. In ~45% 
of the cases, our analysis suggests novel RNA isoforms 
that are differentially regulated, thus generating 
hypotheses to be tested for genes of interest. Finally, by 
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invalidating probe sets in the remaining ~20% of the dis- 
cordant cases, incorrect data can be excluded from further 
consideration. 

Detection of tissue-specific behaviors 

Since SCOREM performs a new analysis of concordance/ 
discordance in each experiment, the results for any given 
group of probe sets may differ from experiment to experi- 
ment. For instance, this could reflect tissue- or cell 
type-specific expression of alternate variants. An interest- 
ing case in point is that of Rbm39 (Entrez Gene ID 
170791). This gene is represented by 10 different probe 
sets, according to the annotation supplied by 
Affymetrix. In the analysis of the GSE9371 data sets, 
the 10 probe sets were not concordant (W = 0.47) 
(Figure 4a), but two subgroups were highly concordant 
(Figure 4b and c). If the naive approach of 'majority 
rules' had been applied, the first group (Figure 4b) with 
five probe sets would have been taken as the representative 
value. However, mapping probes to the transcript 
sequence indicates that only the smaller subgroup of two 
probe sets (Figure 4c) maps to the annotated (exonic) 
transcript sequence (Figure 4d). The larger subgroup 
maps instead to intronic regions of the gene. Whether 
these represent a separate transcript (e.g. non-coding 
RNA, overlapping gene) or unannotated alternate exons 
requires further investigation. Most importantly. 
Figure 4e-i shows that the clustering of probe sets is dif- 
ferent in different experiments. In particular, note that 
probe sets 2 and 3 do not always behave concordantly, 
and in one case all 10 probe sets act in concert. Using the 
standard analysis from Figure 2a, these experiment- 
dependent differences would not be discerned. Thus, the 
method described here is sufficiently flexible to provide 
important additional nuances in understanding the differ- 
ential RNA processing. 

We have examined the possibihty that some genes 
would always have discordant probe sets, and have 
found that not to be the case. In the eight data sets 
examined, only one gene [WD repeat domain 33 
(Wdr33), EntrezGene 74320] was discordant in more 
than one set (GSE4051 and Inguinal Fat). However, 
even in this case, the seven probe sets representing this 
gene were not grouped in the same manner, and so were 
discordant in different ways (see Supplementary Table SI 
for details). This again leads to the conclusion that 
patterns of concordance and discordance cannot be 
assumed to be replicable across data sets. 

DISCUSSION 

The SCOREM algorithm is a novel approach to gene ex- 
pression analysis, making use of Kendafl's coefficient of 
concordance to determine the level of agreement (con- 
cordance) among multiple probe sets representing the 
same gene in a gene expression experiment. While other 
methods have been proposed to resolve the issue of 
so-called redundant probe sets, SCOREM is the only 
one that also provides information about differential 
RNA processing, even when no known alternate 



transcripts exist, and thus can reveal tissue- or ceU-specific 
expression patterns. At the same time, it yields expression 
measures with increased statistical confidence, leading to 
further statistical confidence in secondary analyses such as 
functional enrichment analysis. 

SCOREM is evaluated here in comparison with a 
standard (non-consolidating) approach to expression 
analysis, as weU as two other methods for consolidation. 
Li et al. (4) use ANOVA to analyze agreement between 
multiple probe sets, while Dai et al. (5) use sequence ahgn- 
ment information to create composite probe sets. We 
evaluate each method for its ability to detect differentially 
expressed genes and the statistical confidence of subse- 
quent functional enrichment analyses [measured as the 
average P-value of the top 20 categories detected as 
enriched by GOstats (1)]. 

Comparison with a standard approach 

OveraU, SCOREM consistently caUs more differentially 
expressed genes than the standard method (Table 5). 
This includes more genes in each statistically 
over-represented functional category, resulting in better 
_P- values of the top categories identified by GOstats (1) 
(Table 5). 

For example, our previous studies investigated the role 
of estrogen and estrogen receptors (ER) a and p in 
regulating gene expression in vascular tissue (GSE9371) 
(13). This study found that estrogen/ERp was responsible 
for downregulation of a number of nuclear-encoded genes 
in the mitochondrial respiratory chain. Using a standard 
analysis, 17 genes in the oxidative phosphorylation 
pathway (KEGG:00190) were identified that had at least 
one probe set reporting differential expression. Using 
SCOREM, the hst was extended to 26 genes (14 from 
the original list plus 12 new genes); consequently, the 
/"-value for the statistical significance of this pathway went 
from 1.9 X lO"""* to 1.8 x 10"^ (Supplementary Table S2). 

Since the standard approach makes no attempt to con- 
sohdate redundant probe sets, the resulting hst of differ- 
entially expressed genes typically includes values from 
redundant probe sets, with no mention of other probe 
sets corresponding to these genes. There were a total of 
over 3000 redundant measures in the hsts for the eight 
data sets generated by the standard method, compared 
to only 334 remaining redundant measures for SCOREM. 

Comparison with a statistical approach 

The approach developed by Li et al. (4) utilizes ANOVA 
with the intent of distinguishing between the effects of 
discordancy on expression measures and true experimen- 
tal effects. This analysis takes an all-or-none approach, in 
which all probe sets that are in agreement are consolidated 
while probe sets not in agreement are all treated as 
separate measurements, ignoring the possibihty that 
some subset might be in agreement and that this could 
be the source of valuable information about the gene in 
question. However, as SCOREM demonstrates, in the 
majority of cases concordant subgroups can be delineated 
(Table 3). For example, the ANOVA approach (4) would 
treat aU probe sets as separate signals for Lmanl, even 
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Figure 4. Graphical analysis of probe sets annotated as representing Rbm39 (encoding RNA binding motif protein 39) from multiple experiments 
performed on the Mouse 430 2.0 GeneChip®. (a-c) Expression profiles of probe sets across the 22 samples in the GSE9371 data set; W indicates level 
of concordance, (a) All 10 probe sets; W shows lack of concordance. A dotted line shows a probe set removed by gene filtering (for very low 
expression or very low variance) and therefore not included in the calculation of W. (b) The largest subgroup includes probe sets 4, 6, 7, 9 and 10; 
W shows high concordance, (c) The second subgroup includes probe sets 2 and 3; W shows high concordance, (d) Mapping of the 10 probe sets to 
the genomic sequence of Rbm39 and the exons of its only known transcript. Probes mapped above the line map to the coding (negative) strand, those 
below the line map to the non-coding (positive) strand, (e-i) Connected subgraphs indicating groups of concordant probe sets in five different 
experiments. In parentheses are the number of samples in each data set (n) and the critical value for W for a data set of that size. W is given for each 
group as a whole (bottom) and for each concordant subgroup (below each subgroup). Filled circles indicate statistically significant differential 
expression in that experiment. A dotted circle indicates a probe set removed during gene filtering in that data set. 



though three out of five probe sets are hybridizing 
concordantly and can be consolidated into one statistically 
significant increasing value, while a fourth is decreasing 
(Supplementary Table SI). Finally, when directly 



compared, SCOREM finds more differentially expressed 
genes and shows better enrichment of experimentally 
relevant functional categories than the ANOVA 
approach (Table 5). 
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Table 5. Comparison of SCOREM with other methods in terms of detection of differential expression and functional enrichment 



Data Set 


Standard (+FDR) 


ANOVA 


Custom CDF 


SCOREM (+FDR) 


Number of j 


^enes called differentially expressed 








GSE3678 


1026 (1064) 


402" 


667 


1405 (821) 


GSE4051 


2829 (2400) 


313" 


1678 


3829 (1930) 


GSE4799 


2146 (323) 


832" 


na'' 


4660 (298) 


GSE9371 


720 (102) 


NA 


480 


1550 (123) 


BrCa 


73 (1137) 


NA 


NA*" 


931 (592) 


VSMC 


80 (92) 


NA 


126 


449 (53) 


GonFat 


1753 (6005) 


NA 


1966 


4502 (3455) 


IngFat 


777 (3925) 


NA 


940 


2989 (1782) 


Average P-value of top 20 enriched GO categories 








GSE3678 


4.5 X 10"'" (1.8 X 10"') 


3.5 X 10"^" 


2.5 X lO"*" 


1.3 X 10"'" (5.1 X 10"') 


GSE4051 


6.1 X 10"'° (5.1 X 10"**) 


NA 


2.0 X lO""* 


1.6 X lO"*" (4.9 X 10"**) 


GSE4799 


3.6 X 10"'" (5.3 X 10"') 


1.2 X 10"^" 


NA'' 


2.0 X 10"'^ (2.8 X 10"'") 


GSE9371 


6.1 X 10"' (3.0 X 10"') 


NA 


8.6 X lO"*" 


1.8 X 10"' (9.5 X lO"*") 


BrCa 


3.6 X 10"" (5.0 X 10"') 


NA 


NA'' 


6.3 X 10"' (3.0 X 10"') 


VSMC 


NA (NA) 


NA 


9.8 X 10"" 


4.5 X 10"" (NA) 


GonFat 


3.2 X 10"" (2.5 X 10"'^) 


NA 


1.2 X 10"' 


8.3 X 10"'^ (1.7 X 10"") 


IngFat 


1.6 X 10"** (4.4 X lO"') 


NA 


7.9 X lO"*" 


3.5 X 10"** (1.9 X 10"**) 



"Number of differentially expressed genes or average /"-values of top three or five GO categories as given in Ref (4). 
""NA indicates data sets with no raw data available, where the custom CDF approach could not be applied. 

''/'-values are for top 5 (Custom CDF) or top 10 (SCOREM) overrepresented GO categories; NA indicates fewer than five GO categories with any 
overrepresentation. 



Table 6. Comparison of the number of probes, probe sets used and 
genes represented in Affymetrix and Brainarray custom CDF files 



Array 


CDF 


Probes 


Probe sets 


Genes 


Human U133 Plus 2.0 


Affymetrix 


604258 


54675 


19 798 




Custom 


277 789 


19 008 


18 974 


Human U133A 


Affymetrix 


247 965 


22 283 


12718 




Custom 


167 345 


12 078 


12 065 


Mouse 430 2.0 


Affymetrix 


496468 


45 101 


20757 




Custom 


240917 


17 306 


17 289 



Comparison with a custom CDF approach 

Dai et al. (5) use a sequence-based approach to generate a 
custom Chip Definition File (custom CDF) that redefines 
the probe sets on the Affymetrix GeneChip*' arrays to 
generate a single probe set per gene. Probes that match 
more than one transcript (or no annotated transcripts) are 
removed entirely; the remainder is consolidated into one 
probe set. As pointed out in Ref. (31), typically more than 
half of the data from hybridization to an array is 
eliminated. The number of genes represented by the final 
probe sets is reduced as well (Table 6). While dcLeeuw 
et al. (31) salvage these lost probe sets by creating 
hybrid CDFs and annotating 'less rehable probe-sets', 
SCOREM takes the approach of examining probe set 
behavior in each data set, and annotating the rehabihty 
of those whose behavior is inconsistent with the more 
rehable of its siblings. It is not surprising, given the loss 
of data in the custom CDF approach, that SCOREM 
finds more differentially expressed genes and shows 
greater significance of enriched functional categories 
(Table 5). 



The custom CDF approach has been favorably 
reviewed (32) and is currently is use by others, e.g. 
Carroll et al. (33). However, it does have a number of 
limitations. First, it can only be applied to raw data 
(Figure 2a); while gene expression data in GEO are in- 
creasingly available in raw format, many data sets are 
still only available in pre-processed form (34). Second, 
as a result of eliminating so many probes, the redefined 
probe sets vary widely in size, from as few as three to over 
100 probes, instead of the original 11, thus in any given 
probe set there may be too few probes left to generate a 
statistically significant result. For example, analysis of the 
GSE9371 data set with SCOREM indicates that all 
three probe sets (33 probes total) for Cox7a21 (Entrez 
Gene 20463) behave in a highly similar fashion, leading 
to a highly significant combined expression change 
(Supplementary Table S2). However, the BrainArray 
custom CDF file eliminates aU but 6 of the 33 probes, 
and consequently does not consider the results for this 
gene to be significant. Third, the new probe set definitions 
are based on the current annotated genomic sequences, 
which are certain to change, especially as more splice 
variants are found. As a result, the entire analysis must 
be repeated each time a new version of the custom CDF is 
released, requiring periodic downloads of large annotation 
packages. Fourth, this approach eliminates non-annotated 
sequences, producing a bias toward well-understood tran- 
scripts, and limits the ability of researchers to identify 
unannotated transcripts that nevertheless may be of 
critical importance in a particular study. Finally, this 
approach is based solely on sequence and not at all on 
behavior, with the same probes being consofidated for 
each gene expression experiment. However, our method 
reveals that redundant probe sets can show independent 
behaviors in different conditions or tissue or cell types. 
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As an example of the effect of combining probe sets 
based only on sequence, the gene Shprh is represented 
by four probe sets on the Affymetrix Mouse 430 2.0 
array. The BrainArray custom CDF retains and 
combines 35 of the 44 probes for these probe sets. 
SCOREM, however, reveals that two of the probe sets 
are not hybridizing at all, while the other two uniquely 
map to (and hybridize with) each of two different 
known isoforms (Figure 3a). As a result, the BrainArray 
analysis concludes this gene is not differentially expressed, 
while SCOREM reveals that both isoforms are, but in 
different directions. 

Further applications 

Although in this article we have only presented the results 
of applying SCOREM to Affymetrix microarray data, 
SCOREM is platform-independent and can be applied 
to any type of data containing redundant measures 
where a single value is desired. All that is needed is an 
ExpressionSet object containing normalized data and an 
MArrayLM object with P-values, such as produced by 
eBayes. In the case of the discordant behaviors where 
one probe set or subgroup is reporting differentially ex- 
pression and other subgroup(s) for that gene are not, or 
where the groups are reporting differential expression of 
different magnitudes (Table 4, columns a and b), the 
post hoc analysis described in Results can also be 
applied. Finally, results of a Blat analysis can be down- 
loaded from the UCSC website, and visualized with a tool 
such as the Integrated Genome Browser (IGB) (35). 
Therefore, SCOREM can be apphed to any type of data 
and make sense of any type of discordant behavior present 
in that data. 



CONCLUSIONS 

The SCOREM algorithm has several advantages, both 
technical and biological, over the other methods 
examined. Its technical advantages include that: (i) it is 
applicable to raw data (.CEL) or preprocessed data; (ii) 
its basic analysis does not change, even when sequences 
are re-annotated, therefore, only the post hoc analysis 
must be repeated when new information becomes avail- 
able; (iii) probe sets that do not map to currently known 
coding sequences are retained for future analysis, not dis- 
carded; (iv) it does not require downloading of multiple, 
large annotation packages from external sources, instead 
utilizing the latest annotation obtained directly from 
NCBI; and (v) it is applicable to any type of expression 
data with redundant expression measures, regardless of 
platform. 

SCOREM also has several key biological advantages 
over statistical-only or sequencing-only approaches, 
including that: (i) it identifies differential RNA processing 
of known isoforms, and it detects potential novel 
isoforms; (ii) it makes no a priori assumptions that 
behavior will be the same across experiments, allowing 
the detection of tissue- or cell-specific differential process- 
ing; and (iii) when redundant probe sets behave 
concordantly, it provides greater statistical confidence in 



measuring differential expression and in subsequent func- 
tional enrichment analyses. 

By assessing redundant probe sets based on their 
behavior, not just their sequence, we achieve an increase 
in statistical power for concordant sets, and an opportun- 
ity for further analysis for discordant sets. Therefore, by 
utilizing the SCOREM approach, we obtain results with 
greater statistical confidence in the determination of dif- 
ferential expression and subsequent analysis of functional 
enrichment, and gain additional information about 
tissue-and isoform-specific behavior of genes. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Tables 1-3, Supplementary R package file. 
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